Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs
نویسندگان
چکیده
A computational model, IMP-TYPE, is proposed for the classification of five types of integral membrane proteins from protein sequence. The proposed model aims not only at providing accurate predictions but most importantly it incorporates interesting and transparent biological patterns. When contrasted with the best-performing existing models, IMP-TYPE reduces the error rates of these methods by 19 and 34% for two out-of-sample tests performed on benchmark datasets. Our empirical evaluations also show that the proposed method provides even bigger improvements, i.e., 29 and 45% error rate reductions, when predictions are performed for sequences that share low (40%) identity with sequences from the training dataset. We also show that IMP-TYPE can be used in a standalone mode, i.e., it duplicates significant majority of correct predictions provided by other leading methods, while providing additional correct predictions which are incorrectly classified by the other methods. Our method computes predictions using a Support Vector Machine classifier that takes feature-based encoded sequence as its input. The input feature set includes hydrophobic AA pairs, which were selected by utilizing a consensus of three feature selection algorithms. The hydrophobic residues that build up the AA pairs used by our method are shown to be associated with the formation of transmembrane helices in a few recent studies concerning integral membrane proteins. Our study also indicates that Met and Phe display a certain degree of hydrophobicity, which may be more crucial than their polarity or aromaticity when they occur in the transmembrane segments. This conclusion is supported by a recent study on potential of mean force for membrane protein folding and a study of scales for membrane propensity of amino acids.
منابع مشابه
Multipass membrane protein structure prediction using Rosetta.
We describe the adaptation of the Rosetta de novo structure prediction method for prediction of helical transmembrane protein structures. The membrane environment is modeled by embedding the protein chain into a model membrane represented by parallel planes defining hydrophobic, interface, and polar membrane layers for each energy evaluation. The optimal embedding is determined by maximizing th...
متن کاملPrediction of hydrophobic regions effectively in transmembrane proteins using digital filter
The hydrophobic effect is the major factor that drives a protein molecule towards folding and to a great degree the stability of protein structures. Therefore the knowledge of hydrophobic regions and its prediction is of great help in understanding the structure and function of the protein. Hence determination of membrane buried region is a computationally intensive task in bioinformatics. Seve...
متن کاملTopology Prediction of Membrane Proteins Based on a Modified “Positive-Inside” Rule
It is difficult to determine the 3D structure of integral membrane proteins by experimental techniques. However, theoretical prediction of the secondary structures from amino acid sequences appears much easier for intrinsic membrane proteins than soluble proteins, because of steric constraints from membrane structure and weak hydrophobic interaction within the lipid bilayer. Then, the next step...
متن کاملTopology Prediction of Membrane Proteins Based on a Modi ed \Positive-Inside" Rule
It is di cult to determine the 3D structure of integral membrane proteins by experimental techniques. However, theoretical prediction of the secondary structures from amino acid sequences appears much easier for intrinsic membrane proteins than soluble proteins, because of steric constraints from membrane structure and weak hydrophobic interaction within the lipid bilayer. Then, the next step i...
متن کاملA novel human apolipoprotein (apoM).
A novel human apolipoprotein designated apolipoprotein M (apoM) is described. The unique N-terminal amino acid sequence of apoM was found in an approximately 26-kDa protein present in a protein extract of triglyceride-rich lipoproteins (TGRLP). The isolated apoM cDNA (734 base pairs) encoded a 188-amino acid residue-long protein, distantly related to the lipocalin family. The mRNA of apoM was d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational chemistry
دوره 30 1 شماره
صفحات -
تاریخ انتشار 2009